Goto

Collaborating Authors

 optimal stochastic search


Optimal Stochastic Search and Adaptive Momentum

Neural Information Processing Systems

Stochastic optimization algorithms typically use learning rate schedules that behave asymptotically as J.t(t) J.to/t. The ensem(cid:173) ble dynamics (Leen and Moody, 1993) for such algorithms provides an easy path to results on mean squared weight error and asymp(cid:173) totic normality. We apply this approach to stochastic gradient algorithms with momentum. We show that at late times, learning is governed by an effective learning rate J.tejJ J.to/(l - f3) where f3 is the momentum parameter. We describe the behavior of the asymptotic weight error and give conditions on J.tejJ that insure optimal convergence speed.